Reduction of Dutch Sentences for Automatic Subtitling
نویسندگان
چکیده
We compare machine learning approaches for sentence length reduction for automatic generation of subtitles for deaf and hearing-impaired people with a method which relies on hand-crafted deletion rules. We describe building the necessary resources for this task: a parallel corpus of examples of news broadcasts of the Flemish VRT broadcasting corporation, and a Dutch shallow parser based on the material of the Spoken Dutch Corpus (CGN). We evaluate the sentence simplifiers and discuss their performance.
منابع مشابه
Automatic Sentence Simplification for Subtitling in Dutch and English
We describe ongoing work on sentence summarization in the European MUSA project and the Flemish ATraNoS project. Both projects aim at automatic generation of TV subtitles for hearing-impaired people. This involves speech recognition, a topic which is not covered in this paper, and summarizing sentences in such a way that they fit in the available space for subtitles. The target language is equa...
متن کاملSentence Compression For Automatic Subtitling
This paper investigates sentence compression for automatic subtitle generation using supervised machine learning. We present a method for sentence compression as well as discuss generation of training data from compressed Finnish sentences, and different approaches to the problem. The method we present outperforms state-of-the-art baseline in both automatic and human evaluation. On real data, 4...
متن کاملSTON: Efficient Subtitling in Dutch Using State-of-the-Art Tools
We present a modular video subtitling platform that integrates speech/non-speech segmentation, speaker diarisation, language identification, Dutch speech recognition with state-of-the-art acoustic models and language models optimised for efficient subtitling, appropriate preand postprocessing of the data and alignment of the final result with the video fragment. Moreover, the system is able to ...
متن کاملAutomatic Classification of Sentences in Dutch Laws
The work described here builds on [1], where we presented a categorisation of norms or provisions in legislation. We claimed that the categories are characterized by the use of typical sentence structures and that this would enable automatic detection and classification. In this paper we present the results of experiments in such automatic classification of provisions. We have defined fourteen ...
متن کاملIntralingual Open Subtitling in Flanders: Audiovisual Translation, Linguistic Variation and Audience Needs
This article presents an overview of the main findings of an interdisciplinary research project carried out by scholars from a department of translation and interpreting, a department of communication science and a department of linguistics. The project investigated Dutch open subtitling of native speakers of either northern Dutch or a Flemish (regional) variant of Dutch on Flemish television. ...
متن کامل